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Introduction to Content Extraction 


■ New technologies can find Essential 
Elements of Information in documents 

■ The Center for Content Extraction provides 
"one stop shopping" for these technologies 
at NSA 
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Extraction can benefit SIGDEV from end to end 


■ Selection 

■ Translation & Transliteration 

■ Analysis 

■ Interpretation/Enrichment 

■ Retrieval 

■ Storage & Distribution 
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STAIRS Partners 


S (Marina, CEA) 
T (Cybertrans) 

A (SNA/Paintball, Synapse) 

I (Nymrod, Thundercloud) 
R (Journeyman/CPE) 

S (GoldenRetrieveo SocioPath) 
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Implementation: CCE Extraction Architecture (LexHound) 


Subscription Based 
Customers - extracted 
report/ transcript content 

Marina ( com m s t racki n g ) 
Synapse/EKS (link analysis) 
Ny m rod (Ma me Match ing) 


Web Service 
On 
Demand 
Customers 


[wel 


Web Servi 

LexHound Web Demo 
CAMT (translation) 
TKB (target knowledge base) 
SNA (social network analysis) 
GIS (geo mapping) 
NTOC (terror cell tracking) 
H ere syitch (UC col latera I ) 
Golden Retrieve r (record building) 
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Elaboration: The Central Importance of Storage 


□ Each of the STAIRS Steps exploits 
stored information 

■ Selection Dictionaries ("get it") 

■ Linguistic Glossaries for Translation 

■ Wikis etc for enrichment ("know it") 

□ Manual record-formation is slow, prone 
to omissions and inconsistencies 

■ <200K Person Targets in TKB 

■ Growth ~ = 20K/year 

□ Automatic extraction accelerates storage 

■ >3000K Citation Records in Nymrod Entity DB 

■ Growth ~= lOOOK/year 

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320108 


TOP SECRET//COMINT//REL TO USA.AUS, CAN, GBR, NZL//20320108 

Machine vs. Manual Chief-of-State Citations 



Nymrod (machine-extracted) Citations 

Last TKB 


Name 

Role 

Cod 

« 

Cites 

Ml am 1 9 1 

Update 

1 

Abdullah Badawi 

Malaysian Prime 
Minister 

COS 

> 100 

10/15/200 
7 

2 

Abdullah! Yusuf 

Somali President 

COS 

> 300 

N/A 


A hi i ftj1a!7iiri 

nUU IVInZ.ll 1 

(Mahmud 'Abbas) PA 

i 1 1? olUt?| 11 

COS 

> 200 

5/9 0/700Q 

A 
*t 

Alan Garcia 

Peruvian President 

COS 

> 100 

N/A 

5 

Aleksandr Lukashenko 

Belarus! an President 

COS 

> 50 

N/A 

6 

Ah/ aim (~!nlnim 

Ri jatpmalan Prp^iripint 

\J L^dLCjII IdlLlI 1 1 1 C7 0 1 C7 1 11 



N/A 

7 
t 

Alvaro Uribe 

Colombian President 

COS 

> 700 

N/A 

8 

Amadou Toumani Toure 

Malian President 

COS 

> 50 

N/A 

9 

Angela Merkel 

German Chancellor 

COS 

> 300 

N/A 

10 

Bashar al-Asad 

Syrian President 

COS 

> 800 

N/A 

122 

YuliyaTymoshenko 

Ukrainian Prime 

COS 

> 200 

N/A 


TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320108 



Human Language 
Technology 


